84 research outputs found

    X86_64 vs Aarch64 Performance Validation with COTSon

    Get PDF
    In this study, we provide a set of architectural parameters for the HPLabs COTSon simulator that can be used to model existing processors, such as the Intel i7700 (X86_64 architecture) and the ARM A53 (Aarch64 architecture). We carry out an initial validation, by comparing the execution time while performing the weak scaling of the architecture, in the case of two common benchmarks. We use the Recursive Fibonacci and Matrix Multiplication benchmarks for simplicity. By using the simulator, we can then further study the sensitivity of the architecture and derive which features may matter most to evaluate the performance. Our goal here is to verify that the COTSon simulator can be used to model both the X86_64 and Aarch64 architectures. Based on this validation study, we have the possibility to analyze the bottlenecks and desirable microarchitectural features of modern architectures

    Reconfigurable Logic Interface Architecture for CPU-FPGA Accelerators

    Get PDF
    Programmable System-on-Chips (SoC) are a flexible solution to offload part of the computational power from CPU to FPGA and accelerate the execution time. In today ARM-based SoCs, CPU and FPGA are usually connected to each other through several different communication links based on AMBA standard. This paper presents two possible design as reconfigurable logic interface architectures to be employed as a high performance interface module in programmable logic accelerators. These designs provide us with programmability for bidirectional data communication paths between CPU memory-mapped master interface and FPGA. Our first proposed design offers up to 32 configurable registers while the other has up to 32 configurable FIFOs to be able to exchange larger data. Both of these architectures communicate to programmble logic accelerators through the data stream channels

    From COTSon to HLS: translating timing into an architecture

    Get PDF
    Nowadays, the increasing core number benefits many workloads, but programming limitations to exploiting full performance still remain. A Data-Flow execution model is capable of taking advantage of the full parallelism offered by multicore systems. In such model, the execution can be decomposed in fine-grain threads named Data-Flow Threads (DF-Threads) so that each of them can execute only when their inputs are available. The execution overhead and power consumption is lowered thanks to the reduction of the data push-pull, as well as the burden of thread management

    Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise: Designing a Computer Architecture via HLS)

    Get PDF
    Translating a system requirement into a low-level representation (e.g., register transfer level or RTL) is the typical goal of the design of FPGA-based systems. However, the Design Space Exploration (DSE) needed to identify the final architecture may be time consuming, even when using high-level synthesis (HLS) tools. In this article, we illustrate our hybrid methodology, which uses a frontend for HLS so that the DSE is performed more rapidly by using a higher level abstraction, but without losing accuracy, thanks to the HP-Labs COTSon simulation infrastructure in combination with our DSE tools (MYDSE tools). In particular, this proposed methodology proved useful to achieve an appropriate design of a whole system in a shorter time than trying to design everything directly in HLS. Our motivating problem was to deploy a novel execution model called data-flow threads (DF-Threads) running on yet-to-be-designed hardware. For that goal, directly using the HLS was too premature in the design cycle. Therefore, a key point of our methodology consists in defining the first prototype in our simulation framework and gradually migrating the design into the Xilinx HLS after validating the key performance metrics of our novel system in the simulator. To explain this workflow, we first use a simple driving example consisting in the modelling of a two-way associative cache. Then, we explain how we generalized this methodology and describe the types of results that we were able to analyze in the AXIOM project, which helped us reduce the development time from months/weeks to days/hours

    Translating Timing into an Architecture: The Synergy of COTSon and HLS (Domain Expertise—Designing a Computer Architecture via HLS)

    Get PDF
    Translating a system requirement into a low-level representation (e.g., register transfer level or RTL) is the typical goal of the design of FPGA-based systems. However, the Design Space Exploration (DSE) needed to identify the final architecture may be time consuming, even when using high-level synthesis (HLS) tools. In this article, we illustrate our hybrid methodology, which uses a frontend for HLS so that the DSE is performed more rapidly by using a higher level abstraction, but without losing accuracy, thanks to the HP-Labs COTSon simulation infrastructure in combination with our DSE tools (MYDSE tools). In particular, this proposed methodology proved useful to achieve an appropriate design of a whole system in a shorter time than trying to design everything directly in HLS. Our motivating problem was to deploy a novel execution model called data-flow threads (DF-Threads) running on yet-to-be-designed hardware. For that goal, directly using the HLS was too premature in the design cycle. Therefore, a key point of our methodology consists in defining the first prototype in our simulation framework and gradually migrating the design into the Xilinx HLS after validating the key performance metrics of our novel system in the simulator. To explain this workflow, we first use a simple driving example consisting in the modelling of a two-way associative cache. Then, we explain how we generalized this methodology and describe the types of results that we were able to analyze in the AXIOM project, which helped us reduce the development time from months/weeks to days/hours

    A data-flow execution engine for scalable embedded computing

    Get PDF
    Nowadays embedded systems are increasingly used in the world of distributed computing to provide more computational power without having to change the whole system and the programming model. We propose a DataFlow Execution Engine (DEE) to spawn asynchronous, data-driven threads, among embedded cores to achieve a seamless distribution of threads without the need of using a distributed programming model. Our idea relies on the creation of a hardware scheduler that can handle creation, thread-dependency, and locality of many fine-grained tasks. We present an initial evaluation of our DEE that is suited for FPGA implementation. Our initial results show the importance of a hardware based support for such thread execution model

    Fast and Accurate Multivariate Gaussian Modeling of Protein Families: Predicting Residue Contacts and Protein-Interaction Partners

    Get PDF
    In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/cod

    An FPGA-based Scalable Hardware Scheduler for Data-Flow Models

    Get PDF
    This paper presents a scheduler for Data-Flow threads implemented in reconfigurable logic for being deployed on Reconfigurable MPSoCs (i.e., Multi-Processing System on Chips with FPGA). "Data-Flow threads" (DF-Threads) is a novel execution model for mapping threads on local or distributed cores transparently to the programmer. This model is capable of being parallelized massively among different cores and it handles even hundreds of thousands or more Data-Flow threads, and their associated data frames, in order to distribute them both in a local node and through the network to other nodes in a transparent way. The Hardware Scheduler (HS) is designed for being used in Programmable Logic (PL) of MPSoC FPGAs and it deals with the GPP cores, providing them with Data-Flow threads ready to be executed. The overall design is modeled and tested through the HPLabs COTson simulator. Here we use the Block Matrix Multiply benchmark to analyze the potentiality of the proposed model

    An Overview on Current Non-invasive Diagnostic Devices in Oral Oncology

    Get PDF
    Oral squamous cell carcinoma (OSCC) is the most common head and neck malignancy, and despite advances in cancer therapies, the overall 5-year survival rate has remained below 50% over the past decades. OSCC is typically preceded by potentially malignant disorders (PMD), but distinguishing high-risk from low-risk PMD is challenging. In the last years, several diagnostic methods as light-based detection systems (LBDS) have been proposed to facilitate the detection of OSCC and PMD. Furthermore, the recent evolution of nanotechnology may provide new opportunities to detect PMD and OSCC at an early stage. Indeed, several preclinical studies showed the potential of nanotechnology to enhance diagnostic accuracy. For these reasons, it is fundamental to conduct studies to evaluate the efficacy of nanotechnology implementation in LBDS. The aim of this article is to review the current literature on LBDS and to provide a summary of the sensitivity and specificity of each technique, and possible future applications of nanotechnologies. The LBDS showed great potential for screening and monitoring oral lesions, but there are several factors that hinder an extensive use of these devices. These devices seem to be useful in assessing lesion margins that must be biopsied. However, to date, conventional oral examination, and tissue biopsy remain the gold standard for OSCC diagnosis. The use of nanotechnologies could be the next step in the evolution of LBDS, thus providing devices that can help clinicians to detect and better monitor oral lesions

    A highly-detailed anatomical study of left atrial auricle as revealed by in-vivo computed tomography

    Get PDF
    The left atrial auricle (LAA) is the main source of intracardiac thrombi, which contribute significantly to the total number of stroke cases. It is also considered a major site of origin for atrial fibrillation in patients undergoing ablation procedures. The LAA is known to have a high degree of morphological variability, with shape and structure identified as important contributors to thrombus formation. A detailed understanding of LAA form, dimension, and function is crucial for radiologists, cardiologists, and cardiac surgeons.This review describes the normal anatomy of the LAA as visualized through multiple imaging techniques such as computed tomography (CT), magnetic resonance imaging (MRI), and echocardi-ography. Special emphasis is devoted to a discussion on how the morphological characteristics of the LAA are closely related to the likelihood of developing LAA thrombi, including insights into LAA embryology
    • …
    corecore